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Abstract. We develop a martingale approximation approach to studying the limiting 
behavior of quadratic forms of Markov chains. We use the technique to examine the 
asymptotic behavior of lag-window estimators in time series and we apply the results to 
Markov Chain Monte Carlo simulation. As another illustration, we use the method to 
derive a central limit theorem for U-statistics with varying kernels. 

1. Introduction 
This paper deals with quadratic forms of the type 

n i 

C/n(M = J^5^«^n(^,J>n(^€,^,), n > 1, (1) 

for a stochastic process n > 0}, weight matrices Wn N x N — >■ M and symmetric 

kernels : A' x A" — )■ M. Quadratic forms of possibly time-dependent random variables 
naturally arise in a variety of statistical and econometric problems, and their asymptotic 
properties are of particular importance to develop asymptotically valid inference proce- 
dures. 

For independent sequences n > 0}, the well known Hoeffding decomposition 

provides a useful approach to studying the asymptotic properties of Un^n) because it de- 
composes the statistic into two (uncorrelated) martingale sequences, which are then easily 
handled by standard martingale theory. See, e.g., Serfling (1980) for a review. When the 
process n > 0} is time-dependent, however, the classical Hoeffding decomposition 

is not very useful because the resulting representation does not have the desirable mar- 
tingale property in general. As a consequence, the large sample properties of quadratic 
forms of time-dependent random variables are typically established in a less systematic 
way. The most well understood case is the case of a standard U-statistics where /i„ does 
not depend on n and Wn{i,j) = 1 if ^ 7^ j and otherwise (Yoshihara (1976); Eagleson 

2000 Mathematics Subject Classification. 60J10, 62M10. 

Key words and phrases. Central limit theorems, Markov Chains, Markov Chain Monte Carlo, Martingale 
approximations, Quadratic forms, U-statistics. 

Y. F. Atchade: University of Michigan, Department of Statistics, Ann Arbor, 48109, MI, United States. 
E-mail address: yvesa@umich.edu. 

Matias D. Cattaneo: University of Michigan, Department of Economics, Ann Arbor, 48109, MI, United 
States. E-mail address: cattaneo@umich.edu. 



2 



YVES F. ATCHADE AND MATIAS D. CATTANEO 



(1979); Dehling and Wendler (2010)). There has been some recent progress. Hsing and 
Wu (2004) considers Un{h) where neither /i„ nor Wn depends on n, whereas Wu and Shao 
(2007) studies Un{hn) when hn(x,y) = h{x,y) = xy for a martingale-difference sequence 
(see also Bhansali et al. (2007) for i.i.d. sequences). 

We develops a martingale approximation for Un{hn) which allows for a general and 
systematic analysis of Un{hn) when n > 0} is a Markov chain. Martingale approxi- 

mation is a well established technique when dealing with linear partial sums of dependent 
processes (Maxwell and Woodroofe (2000); Merlevede et al. (2006)), but has not been fully 
explored in dealing with quadratic forms (a notable exception is Wu and Shao (2007)). 
In the present paper, we obtain an approximating quadratic martingale to Un{hn) from a 
solution of a bivariate analog of the well known Poisson's equation. 

As an application we study the asymptotic behavior of lag-window estimators of long- 
run variance (asymptotic variance) for Markov chains (see, e.g., Priestley (1981)). We 
obtain a decomposition of lag-window estimators that shed some new light on the asymp- 
totic behavior of these estimators, particularly by contrasting the classical asymptotics 
and the so-called "fixed-b" asymptotics (Neave (1970); Kiefer and Vogelsang (2005)). We 
derive two theorems that extend existing results. We obtain the consistency of lag- window 
estimators for non-geometrically ergodic Markov chains extending recent results of Flegal 
and Jones (2010) and Atchade (2011); and we extend the "fixed-b" asymptotics frame- 
work to handle non-stationary Markov chains. These results have important implications 
for Markov Chain Monte Carlo (MCMC) simulations, offering in particular new robust 
procedures for constructing Monte Carlo confidence intervals. 

As another application of the martingale approximation method, we derive a central 
limit theorem for U-statistics with varying kernels without imposing stationarity and under 
assumptions that are more easily verifiable. In particular, we do not rely on mixing 
conditions. 

The paper is organized as follows. The rest of the introduction outlines the general 
setup and introduces the main notation employed throughout, while Section 2 derives the 
main martingale approximation method. Section 3 derives the asymptotic properties of 
lag-window estimators and, in particular, applies these results to MCMC simulation. We 
study U-statistics with varying kernels in Section 4. All the proofs are presented in Section 
5. 

1.1. Setup and Notation. Throughout the paper, n > 0} denotes a Markov chain 

taking values in a general state space {X,B) equipped with a countably generated sigma- 
algebra B. We denote by P the transition kernel of the Markov chain and fi its invariant 
distribution whose existence is assumed. Unless explicitly stated otherwise, {X^, n > 0} 
is a nonstationary Markov chain with initial distribution p. 

We will rely on the following set of general notation. Suppose that (T, A) be an arbitrary 
measure space. If : T — >■ [1, -|-oo) is a function, the VF-norm of a function / : T — >■ M 
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is defined as \f\w '■= ^'^Vx&'i The set of measurable functions / : T — >■ M 

with finite W-noim is denoted by CwO^) or simply when there is no ambiguity on the 
space T. For a finite real- valued signed measure on T, we denote the W-norm of u as 

II^IIh^ •= / = sup / f{x)v{dx) 

J |/|iv<i J 

where is the total variation measure of v. We denote M.w^) the space of all finite real- 
valued signed measures on T such that W^Ww < oo- It is well-known that (T), || • 
is a Banach space. When the measure space T is understood, we simply write Mw- We 
will use the notation ^{f) to denote the integral J f{x)v{dx). If ^, v are two finite signed 
measures on (T,^), we denote their product by or fi^iy, and the product of a finite 
number k of finite signed measures vi, . . . ,iyk is denoted by (2)j=i ^j- 

If Q is a transition kernel on (T,^), its iterates arc defined as: is the identity 
kernel {Q^{x,A) = IaIx)) and for n > 1, we define = / Qix,dz)Q''-^{z,-)- If 

h : TxT^Risa bivariate function then Qh is the bivariate function defined by the rule 
Qh(x,y) = J Q{x,dz)h{z,y) andQ'^hisdefin.edasQ^h{xi,X2) = f Q{xi, dzi) f Q{x2,dz2)h{zi, 
If h : T — )■ M is univariate, Qh is defined similarly as Qh{x) = J Q{x,dz)h{z). Fix Q a 
Markov kernel, and F : T x T ^ [1, oo). For p > 1 and a function h: T x T ^ M, we 
define 

{jQ{x,dz)\h{z,y)\pf^ 



p,v - 

For a univariate function V : T 
as 



sup 

x,yeT 



Vix,y) 
[l,oo) and for /t : T 



I, we define 



p,V 



similarly 



y --supVix) 



Q{x,dz)\h{z)\P 



i/p 



When we use the notation 



below, it will always be with respect to P, the Markov 



kernel of the reference process {Xn, n > 0}, unless stated otherwise. The following 
short-range dependence concept will play an important rule. 

Definition 1.1. Fixr G N. For measurable functions Vr < Wr ■ [l,oo), we say that 

the transition kernel Q with invariant distribution ji satisfies the condition C{r, Vr, Wr) if 
there exists a finite constant c such that 



r 

(g)(Q^^(x,v)-/x) 

3=i 



< cWr{xi, . . . ,Xr), {xi, . . . ,Xr) 



(2) 



Vr 



Throughout the paper, we denote by c a finite constant which depends solely on the 
kernel P but whose actual value can change from one equation to the next. In particular 
c does not depend on the family of function n > 1} considered. Finally, all limits are 
taken as n — >■ oo unless explicitly noted otherwise. 
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2. A MARTINGALE APPROXIMATION FOR QUADRATIC FORMS 

For notational convenience, we shall write fi to denote the product probability measure 
Jl{du,dv) = fi{du) fi(dv) , where fi is the invariant distribution of the Markov kernel P. 
Consider the following assumption. 

Assumption Al There exist symmetric measurable functions V2 < W2 ■ X x X ^ 
[1,00) such that P satisfies C(2, F2,W^2)- Furthermore, P^W2{x) < 00 for all 
X e X^ and for s G {1, 2}. 

Remark 1. It is always possible to deduce Al from a univariate short-range dependence 
assumption. Indeed, if P satisfies C(l,Fi, and C(l, V2,VF2), and PWi < 00, PW2 < 
00, define V2{x,y) = Vi{x)V2{y) and W2ix,y) = Wi{x)W2{y)- Then 

II •) - ^) (g) (P'^iy, •) - 1^) \\v, = WP^'i^, •) - mIIfi \\P"\y, •) - Ml|y.. 

Thus 

II {P^{x, •) - (2) {P'-'iv, •) - IIf. < cW2{x, y), 

n>0 m>0 

and therefore Al holds. 

Remark 2. The univariate condition C{1,V,W) holds for geometrically ergodic Markov 
kernels (that is, kernels P for which ||P"(.7;, •) — /i||y converges to zero exponentially fast for 
some V > 1). It also holds for sub- geometrically ergodic Markov kernels (||P"(x, •) — /i||y 
converges to zero sub-geometrically) for which the rate of convergence is summable. It is 
sometimes possible to check the condition C{1,V, W) using Lyapunov drift conditions and 
their extensions and this has been done for several time series Markov models (Douc et al. 
(2004); Meitz and Saikkonen (2008); Meyn and Tweedie (2009)). 

We show that whenever Al holds, there exists a martingale approximation to Un{hn) 
that offers a simple route to study the asymptotics of Un{hn)- The space Aiy^{X x X) 
of all finite signed measure oxi X x X with finite || • \\y_^ norm, equipped with the norm 
II • \\y^ is a Banach space. Under Al and for any x^y €i X , 

R2ix,y;{du,dv)):= J] {P''^{x,du) - iiidu)) {P^'{y,dv) - iiidv)) 

ni>0n2>0 

is a finite signed measure that belongs to My^{X x X). Furthermore we have for all 
x,y e X, 

\\R2{x,y;-)\\y^<cW2{x,y). (3) 

Let h : X x X ^M.hc a symmetric measurable function such that /i(|/i|) < 00. Denote 
= j J h{x , y) fi{dx) fi{dy) and define 

^i(a;) := J h{x, z)n{dz) - 0, h2{x, y) = h{x, y) - hi{x) - hi{y) - 9, 
G2{x,y) := R2{x,y;dzi,dz2)h2{zi,Z2), x,yeX. 
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We say that h is degenerate when hi is identically zero. For x ^ X, 5x denotes the Dirac 
measure at x. 

Lemma 2.1. Assume Al. Suppose that /12 € ^V2' ^^^^ ^2 is well-defined, G2 € ^W2' 
and \G2\y^^ < c|/i,2|y2 o-nd for all x,y E X, 

h2{x,y) 

= j {dx{dzi)-P{x,dzi)) J {6y{dZ2)-P{y,dZ2))G2{zi,Z2). (4) 

If in addition P^h2 € -Cy^, then \P^G2\t^^ < c|P*/i2|y2 for s G {1,2}. 

Proof. See Section 5.1. □ 

Remark 3. Equation (4) gives a bivariate Poisson's equation which extends the well known 
univariate Poisson's equation. 

We introduce the function 

M{xi,X2;yi,y2) = j {Sy^{dzi) - P{xi,dzi)) j {5y,^{dZ2) - P{x2,dZ2))G2{zi,Z2) 

= ^2(2/1, 2/2) - PG2{x2,yi) - PG{xi,y2) + P^G2(xi,X2), xi,X2,yi,y2 e X. 

Then (4) can be written as h2{x, y) = h.2(x, y, x,y). A specially important property of A2 
that we rely on in the sequel is the following. For any x, y,u,v e X, it is easy to see that 

J P{x,dy)A2{u,x,v,y) = J P{u,dv)A2{u,x,v,y) = 0. (5) 

Now suppose that we have {hn '■ X x X ^ M}, a family of symmetric measurable func- 
tions such that fi{\hn\) < oo.We write 9n, hn,ii ^n,2; Gn,2, and A„_2 to denote respectively 
the quantities 6, hi, h2, G2, and A2 defined above with h = hn- 

For 1 < J < ^ < ra, we introduce the random variables 

Qn,e,j '■= A„,2 {Xj-i,X£^i,Xj, Xi) . 
For j < £, and by the Markov property and (5), we have 

^{QnAj\^e~l)= J P{Xe-i, dz) An,2 {Xj_l,Xi_i,Xj,z)=0, 

almost surely. This shows that {(X]^=i Qn,e,ji J'e)i 2 < £ < n} is a martingale-difference 
array. We need the following sequences 

{e n ^ 

^yJni^J) + ^WnijJ) > , ^n,li£) ■= Wn{i,j) - Wn{^ - l,j) 
3=\ j=e J 

ro„,2(^,i) := Wn{i,j) - Wn{i,j - 1), 
and tU„,3(^, j) := Wn{i,j) +Wn{£- l,j - 1) - Wn{i,j - 1) - Wn{i- 
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Lemma 2.2. Assume Al and suppose that hn,2 £ ^"'^^ n> 1. Then 

n 

Unihn) = Un,0 + ^ {wn,li^)hn,l{Xe) + Wn{i, i)Qn,e/} 

e=i 

n l-l 

+ XI XI ^n{i,j)Qn,e,j + Cn, (6) 

e=i j=i 

where Un,o = On ^11=1 Ei=i «^n(^,i), and 
n e 

e=i 3=1 

n I 

+ XX^n ^(^'■^■) {PGnM3-^^M - P^Gn,2{Xe-i,Xj_i)) 
l=\ j=l 

u I 

+ X X ^i'^(^,i)^'G'„,2(X^-i, ^,-1) + en, 
l=\ j=l 

where 

n 

en = Yl {^"(^' 0)PGn,2iXo, Xe) - wn{e - 1, 0)P^Gn,2iXo,Xe_i)) 
e=i 

n 

+ X WninJ) {P''Gn,2{Xn, Xj) - PGn,2{Xn, Xj)) 
n 

+ - hi)PGn,2{Xe_uXe) - Wn{£,£)PGn,2{Xe,X^)) . 

£=1 

Proof. See Section 5.2. □ 

Remark 4. The usefulness of this decomposition comes from the fact that the remain- 
der ^„ involves either single summations or difference sequences of the weights Wn. As 
a result, these remainders arc typically negligible compared to the other terms in the 
decomposition and one can easily study the asymptotic behavior of Unihn) by focusing 
on the linear term Yll=i {wn,i{£)hn,i{X£) + Wn{(^,(^)Qn,e,e}: aiid the quadratic martingale 

3. Application: Asymptotic variance estimation 

In this section, we use the martingale approximation of Lemma 2.2 to study the asymp- 
totics of lag-windows estimators of asymptotic variance in time series and we apply the 
results to Markov chain Monte Carlo. Let h : ^ M be a measurable function such that 
< 00. We assume without any loss of generality that fi{h) = 0. We are interested 
in the estimation of the long-run variance (or the asymptotic variance) of h defined as: 

a\h) = Var^(/i(Xo)) + 2X Cov^ (/i(Xo), , (7) 

e>i 
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which plays a role in time series analysis and in Markov Chain Monte Carlo. A classical 
estimator for (T^(/i) is the lag- windows estimators defined as 

n-l 

^IbW ■= 7n,0 + 2 J2 Mkc-'hn,k, (8) 
fe=l 

where 'yn,k '■= n''^ Yl]=i i^i-^j) ~ l^n{h)) {h{Xj^k) — jXnih)) is the A;-th order sample au- 
tocovariance with iin{h) = Yl^=i Wb is a weight function (with a parameter b) 

and {cn, n > 1} is an increasing sequence of positive numbers. We refer the reader to 
Priestley (1981) for detailed discussion on lag-windows estimators. We consider weight 
functions with the following properties. 

Assumption W: For b > 0, Wf, : [0, oo) — t- [0, 1] is a continuous function with 
support [0, b], of class on the interval (0, 6), such that Wb{b) = and ^^(O) = 1. 

This assumption allows for the use of all commonly employed weighting functions, in- 
cluding the Bartlett and Parzen kernels. When wj,{x) = w{x/b), an equivalent parametriza- 
tion of ;^(^) is rl bih) = 7n,o + '^J2k=i'^i^/^nhn,k, with Cn ^ bcn- We impose the 
following ergodicity assumption. 

Assumption A2 There exist measurable functions Vk : X ^ [l,oo) {k = 1,2,3), Vi < 
V2, V2 <V3, such that PV3{x) < 00 for all x G -Y, and P satisfies the assumptions 
C(l, Fi, V2) and €(1,^2^,1/3). Furthermore there exists q > I such that 

supK(Kf(X„)) < 00. (9) 

n>0 

A2 implies Al with V2{x,y) = Vi{x)Vi{y) and W2{x,y) = V2{x)V2{y)- Define the par- 
tial sums S'^^fc := Yl!j=k+i and the weight it;„^b(0) = andwn,bik) = 2n~^Wbikc~^) 
for k > 0. We can rewrite 'yn,k ^ 

n—k 

ln,k = X] K^3)K^3+k) + rr^{n - k)Sn^Q - n'^Sn,0 {Sn-kfi + Sn-k,k) , 

J=l 

so that 

n e 

^Ibih) = E E '"n,b{i - j)KXj)h{Xe) + Rn, (10) 

e=i j=i 

where 

n— 1 



(n J-1 n-l n-j \ 

j=2 k=l j=l k=l ) 

If we set aside the term the expression (10) is of the form (1) with hn{x, y) = h{x)h{y) 
and Wn{£,j) = Wjifii"^ — j)- Here we have hn,i{x) = J h{x)h{y)iJ,{dy) = 0, On = 0, and 
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hn,2{x,y) = h{x)h{y). Define 

G{x) ■.= ^P^h{x), and PG{x) = f P{x,dz)G{z), xeX. 
i>o 

Then G2ix,y) = G{x)G{y), PG2{x,y) = PG{x)G{y), and P^G^ix^y) = PG{x)PG{y). 
Therefore 

Qn/,j = QeQj, where Qe = G{Xe) - PG{Xf_i). 

As above, {(Qf, ^i), ^ > 1} is a martingale: E {Qt\J^e-i) = 0. Prom Lemma 2.2 we obtain 
the fohowing. 

Theorem 3.1. Assume (A2) and (W) and h G Cvi- For all n> 1, 

n n £—1 

^Ibih) = n-' E + E E "^"."(^ - ^'^QeQj + + Cn- (12) 

e=i i=i j=i 

Furthermore, there exist p > 1 and a finite constant c such that for all n> 3, 



and E^P 



n e-1 

^ ^ wn,b{i - j)QeQj 
i=l j=l 



n ' 



Proof. See Section 5.3. □ 

A clearer picture of the behavior of the lag- window estimator emerges from this result. 
For ^? > 2, we have 

n n I— I 

rlb{h)=n-'^Qj + ^^Wn,b{i-j)QeQj+Rn+ Cn , (13) 

e=i i=i j=i '~^/2 

V ' ^ ' Op(c„ ) 

Op{l) Or,(^/^+'^) 
^ V V n n / 

By the law of large numbers for Markov chain the term S^Li Ql converges to cr2(/l). 
As the result, Theorem 3.1 implies that F^^ ^(/i) converges in probability to cj^(/i) provided 
Cn — oo, Cn = o{n) and p > 2 (for 1 < p < 2, specific rate assumption on c„ might be 
needed). The decomposition (13) also gives some insight into the well known fact that 
^n,b{h) often has poor finite-sample properties in estimating cr'^{h), particularly for highly 
correlated time-series. Indeed, for c„ = o(n), both terms Rn + Y17=i Sj=i '^n,bi^ — j)QeQj 
and (n converge to zero but at antagonistic rates. If Cn ~ n, then (n ~ Op{n~^^^) but 
then Rn + Yle=i X]j=i "^nfiif- — j)QeQj ~ 0{1). Whereas for Cn n, the convergence of 
Cn is slow (Cn = Op{c^^^^)) but Rn + XlLi E)j=i '^n,b{£ - j)QiQj vanishes quickly. 

When the goal is to construct confidence interval for jjL{h) (and one is not interested in 
estimating cr^(/i) per se), it has been suggested to use the lag- window estimator F^j,(/i) 
with c„ = n, the so-called "fixed-b asymptotics" (Neave (1970); Kiefer and Vogelsang 
(2005)). With c„ = n, F^^(/i) no longer converges to a^{h), but as it turns out, asymp- 
totically valid confidence intervals can still be derived for /x(/i). We have the following. 
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Theorem 3.2. Under the assumption of Theorem 3.1, the following holds true. 

(1) If p > 2 and Cn = o{n), then ^^bi^) converges in probability to a'^{h). Further- 
more, assuming ^^bi^) ^ o,lmost surely, 

n 

{nTl,{h)}-"' J2 (HX,) - n{h)) ^ 1). 

(2) Let {B{t), < t < 1} be the standard Browian motion. Ifcn = n, then ^nb(^) ~^ 
a^{h)Kb, where 

Kb = l + 2 [ [ Wb{t - s)dB{s)dB{t) 
Jo Jo 

-2B{1) f gbit)dB{t) + 2B\l) f {1 - t)wb{t)dt, 
Jo Jo 

where gh{t) = jQWi,{u)du + ^ Wh{u)du. Furthermore, assuming r^(,(/i) > 
almost surely. 



Proof See Section 5.4. □ 

By Theorem 3.2 (1) an asymptotically valid (1 — a)-confidence interval for is 

^„(/0±Zi_„/2^, (14) 

where Zi_a/2 is the (1 — a/2)-quantile of the standard normal distribution and where 

^n{h) = ^J'^n,b(^)^ with b = 1, Cn = o(n). Typical choice of c„ includes c„ = , 
S G (0, 1) typically around 0.5. Theorem 3.1 (2) provides another asymptotically valid 
confidence interval for ii{h): 

/^nW±tl-a/2^, (15) 



where ti_a/2 is the (1 — a/2)-quantile of the distribution of B{l)/y/Kb and where anih) 



V 

j^(/i), with Cn = bn, with b G (0, 1). 
Although the limiting distribution B{l)/^/Kh is non-standard, it can be simulated, for 
example by Euler discretization of the stochastic integrals in K;,. We report in Table 
1 the 95% quantiles of the distribution of B{l)/^/Kb using Wb{x) = l(o,fe)(a;), wi,{x) = 
(1 — a;/6)l(o,6)(3^) ^iid Wb{x) = (1 — (a;/6)^)l(o,b)(a;), and for different values of b, based 
on 10,000 replications of B{l)/^/Kf,. The distribution departs further from the standard 
normal distribution as b increases. 

In the next simulation examples, we compare the finite sample properties of these 
two confidence intervals in terms of coverage probability and interval length. All the 
simulations are performed using the Bartlett kernel w{x) = 1 — x. 
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Wb{x) — 1 — x/b 


Wb{x)^l-{x/b)^ 


Wb{x) = 1(0,1) 


b = 


0.3 


2.828 


4.134 


5.496 


b = 


0.5 


3.557 


6.580 


6.299 


b = 


0.9 


4.735 


12.575 


13.045 



Table 1. 0.975-quantile of the distribution of B{l)/^/Kb 



3.1. Illustration: the Garch(l, 1) model. Consider the hnear GARCH(1, 1) model 
defined as follows, ho £ (0, oo), uq ~ A/'(0, ho) and for n > 1 



Un 
hn 



h^'^e 

"-n "^ra 

UJ + j3hn-l + 



where {cn, n > 0} is i.i.d. J\f{0, 1) and uj > 0, a > 0, (3 > 0. We assume that a,f3 satisfy 
CI There exists > such that 



E[{(3 + aZY]<h ^~AA(0,1). 



(16) 



It is shown by Meitz and Saikkonen (2008) (Theorem 2) that under (16) the joint process 
{{un,hn), n > 0} is a phi-irreducible aperiodic Markov chain that admits an invariant 
distribution and is geometrically ergodic with a drift function V{u,h) = 1 + /i'^ + \u\'^'^ . 
Therefore for > 2, A2 holds with Vi = V2 = and V3 = V. We are interested in 

a confidence interval for /i(/i) where h{u) = which belongs to £vi- The exact value is 
H{h) = uj{l - a - P)-^ . 

For the simulations we set a; = 1, a = 0.1, /3 = 0.7 which gives /^(/i) = 5. We compare 
the confidence intervals (14) and (15) by computing (by Monte Carlo) their coverage 
probabilities and average lengths. The comparison is performed using sample paths of 
length 60,000 from the GARCH(1, 1) Markov chain. The results are plotted in Figure 1 
and shows across the board better coverage probability of the fixed-b confidence interval 
but, as expected, at the expense of a slightly wider confidence intervals. 



Coverage probabilities 



Average interval lengths 



Classical approach 
Fixed-b approach 















Classical approach 

Fixed-b approach 





delta/b 



delta/b 
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Figure 1: Coverage probabilities plots for various values of 6 (classical confidence 
interval) and b (fixed-b confidence interval). 

3.2. Markov Chain Monte Carlo. Markov Chain Monte Carlo (MCMC) is a popular 
computational tools to obtain random samples from intractable and high-dimensional 
distributions (see e.g. Roberts and Rosenthal (2004) for a survey and for additional 
references). 

Suppose that we interested in sampling from the probability measure /j, and compute the 
integral //(/i) = J h{x)ii{dx). Let {X^, n > 0} be a Markov chains with transition kernel 
P, invariant distribution jj, and initial distribution p. By simulating the Markov chain, we 
approximate /x(^) by the Monte Carlo average Hn{h) = n'^ J2k=i^i-^k)- Furthermore, 
under A2, lim„_^oo '^^''^Var (/x„(/i)) = a'^{h), as given by (7), and a central limit theorem 
holds: n-^/^J2l^^{h{Xk)-Tr{h)) 4 N{(},a^{h)). Therefore (14) and (15) provide two 
valid confidence intervals for fi{h). We compare the coverage probabilities and average 
interval lengths of these two confidence interval procedures with the following simulation 
example. 

3.2.1. Illustration: a Poisson regression model. We undertake the comparison using a log- 
linear model taken from Gelman et al. (2004). For e = 1, . . . , iVg and p = 1,. ..,Np, 
the variables yep are conditionally independent given {{Pp}, {£ep}) € M.^^ x M^s-^p, with 
conditional distribution 

yep ~ 7^' (nepe'^+"^+^^+"=^) , e = l,...,Ne, p = l,...,Np, (17) 

where V{\) is the Poisson distribution with parameter A. In the above display, {riep} is a 
deterministic baseline covariate, and € M, {cte} G "^^^ are parameters. We assume that 
{Pp} and {sep} are independent with distributions 

l3p'^N{0,a}), £ep~iV(0,a,2), e = l,...,iVe, p = l,...,Np, (18) 

for some parameters cr| > 0, cr^ > 0. We assume a diffuse prior for {ij,,a,a'^,af) 
(cr^ > 0, cr| > 0) with the additional constraint that = —'^k=i^'^k- Let 6 = 
(Ai, a, /3, e, (t^, aj) e R^+^e-i+C^p+me . The posterior distribution of 9 given V = {y^p, Uep) 
takes the form 

iT{e\D) cx exp i ^ ye,p{n + ae + Pp + ee,p) - nepei'+'^^+^^+'^^ 

log - f log - ^ i: 4 - 4 E "4 ■ (") 

^ e,p P p=l I 

This posterior distribution is typical of probability distributions for which MCMC is use- 
ful. We set Ne = 3 and Np = 20. Suppose that we are interested in a confidence 
interval for the posterior mean of the parameter ai, i.e. J aiTr{9\I))d9. To compare the 
two confidence intervals methods described above, we generate an artificial dataset with 
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(ai,a2,Ai,cT2,o-|) = (0.35,0.15,-1.0,0.1,0.3). We run a preliminary MCMC sampler for 
6 millions (6 x 10^) iterations and compute its sample mean. We obtain ai = 0.3309. We 
take this value to be / ai'K{9)d9. 

To compare the two confidence interval methods, we use a Random Walk Metropolis 
(RWM) algorithm with proposal kernel A/'(0, kS) where k and S are selected (from a 
preliminary simulation) to yield a reasonably good mixing of the chain. We run the 
MCMC sampler for 60, 000 iterations and discard the first 10, 000 iterations as burn-in. 
We repeat the simulations 200 times in order to estimate the coverage probabilities and 
interval lengths. The results are given in Figure 2. We find again that in terms of finite 
sample behavior, the fixed-b confidence interval is more robust to the choice of b. 



Coverage probabilities Average interval lengths 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

delta/b delta/b 



Figure 2: Coverage probability and confidence interval length for ai and for different 

values of b and 5. 



4. Application: a CLT for U-statistics with varying kernels 

U-statistics with varying kernels are a special case of quadratic forms and correspond 
to setting Wn{^,^) = and Wnif-ij) = I if £ ^ j. We thus have 

n l-\ 

C^n(/ln) = X1I]^"(^^'^J)• 
£=2 i=l 

We illustrate another application of Lemma 2.2, by deriving a CLT for Unihn)- U-statistics 
and U-statistics with varying kernels play an important role in nonparametric and semi- 
parametric statistics. In the present case, under Al, Lemma 2.2 reduces to 



UniK) 



2 

We impose the following moment assumption 



)n n £—1 

1=1 e=2 1=1 
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Bl With V2 and W2 as in Al, suppopse that PV2 < CV2, Pwl < cW\, and for 
p = 2, 

sup K{Wl{Xi,Xj)) <oo. (20) 

We recall from Lemma 2.1 that for s G {1,2}, \P^Gn,2{x.,y)\ < c\P^hn,2\v2^2ix,y), 
and 



\P^hn,2\v2= sup {V2{x,y)y 

x,yeX 



\\p,V2 ' 



P{x,du) J P{y,dv)hn,2iu,i 
< \Phn,2\v2 {^2(a;,y)}"^ / Pix,du)V2{u,y) < c\Phn,2\v^ < c|||/in,2| 

x,yeX J 

for all p > 1, using the assumption PV2 < cV 2 in Bl. In combination with (20), and the 
expression of in Lemma 2.2, it follows that 

By definition, = V2(^£, ^i)-i^G'„,2(^£-i, ^,)-PG„,2(X,_i, X^)+p2G„,2(^^-i, ^j-i; 

and assuming that Phn,2-, £ -^yj' obtains 

|Qn,Ajl < \hn,2{Xt,Xj)\^c\PK,2\v^ (W2(X^-1, X,) + W2(X,_i, X^)) 

+ C|p2/i^^2|_ Pr2(^£-l,^j-l). (22) 

Thus for j <i, 

E(q2 ,^^.|J-,_i) <4 y P(X,_i,dz)|/i,,2(^,^i)|' 

+ 4|||V2|||2,y,E (W2(X^_i,X,0 + W2(X^,X,-_i) + W2(X^_i,X,_i)|J-^_i) . 
Taking the expectation on both side and using (20), it follows that for all n > 1, 

E^/'(Q^,,,,)<cp„,2|||2,^^. (23) 
We impose an addition stability assumptions. 

B2 With W2 as in Al, there exists measurable functions Wi < Vi : X ^ [1,00), 
a symmetric measurable function IA2 '■ X x X ^ [1,00) such that P satisfies 
C(l,Wi,Vi) and for all m > 0, all xi,X2,xs G X, 

J P{xi,dzi) J P"'{zi,dZ2) 

X (W2(Z2,Z1) + W2(z2,Xl)) {W2{Z2,X2)+W2{Z2,X3)) 

<cUl{xi)U2{x2,X3). (24) 

Fut her more, 

supE {Vi{Xe)U{Xe,Xe_,)) < oo. 
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Remark 5. Assumptions B1-B2 are similar to the assumptions imposed in Dehling and 
Wendler (2010) (Theorem 1.8) to obtain a CLT for U-statistics of stationary dependent 
processes. Assumption B2 can be easy to check. For example if W2 is given by W2ix, y) = 
W{x)W{y) and P'^W < cW for all m > 0, then (24) holds with Ui{x) = PW'^{x) + 
W{x)PW{x) and U2{x, y) = W{x) + W{y). 

Define 

o'n.i := j {lJ'P}{dx,dy)Ll{x,y) 

= Var^(/i„,i(Xo)) + 2^Cov^(/i„,i(Xo),/in,i(^€))> and a'^:=n{n-lfal^i. 

e>i 

Theorem 4.1. Assume Al, B1-B2 and let {hn, n>l} he such that hn,2 € ^V2' "^wppose 
also that 

P".2 III 2,^2 = O in^''^CFn,l) ■ (25) 

Then 

— [Un{hn) - 9n{ZV\ ^ 0' in probab. 

Proof. See Section 5.5. □ 

Remark 6. From the above result, it is clear that if ^ S"=i 1^n,i{Xe) converges weakly 
to A^(0, 1), then so does ^ {Un{hn) — (2))- ^'^ ^" does not depend on n, (25) automat- 
ically holds and Theorem 4.1 implies a standard CLT for U-statistics (Yoshihara (1976); 
Dchling and Wcndlcr (2010)). But unlike these previous works, Theorem 4.1 docs not 
assume stationarity and for Markov chains, the weak dependence assumption Al is some- 
times easier to check than mixing assumptions. 

The theorem describes the limiting behavior of UniKi) in the case where the kernels 
hn are not degenerate and the quadratic term X^^^j^ Sj=i Qn,e,j is negligible. In general, 
the quadratic term needs not be negligible. In which case a correct account of the lim- 
iting behavior of Un{hn) will then require a joint study the processes Y17=i ^n,i{Xe) and 

En v^^— 1 



5. Proofs 

5.1. Proof Lemma 2.1. That G2 G ^W2' ^^'^ \^'^\w2 — ^\^'^\v2 Allows from (3). Set 

7Tn,m{x, y, {du, dv)) = (P"(x, du) - ix{du)) (5?) (P'"(?/, dv) - //(o!t;)) . 



Since P^W2{x,y) < oo for all x, y G and s G {1,2} by Al, we deduce that the rhs of 
(4) is well-defined and can be written as G2{x,y) — PG2{y,x) — PG2{x,y) + P^G2(x, y). 



QUADRATIC FORMS OF MARKOV CHAINS 



15 



By dominated convergence, 
G2{x, y) - PG2{y, x) - PG2{x, y) + P^G2{x, y) 

N M 



= lim y^yZ {6x{dzi) - P{x,dzi)) / {5y{dZ2) - P{y,dZ2))'Kn,mh2{zi,Z2), 



n=0 m=0 

= lim [h2{x,y) - 'KN+ifih2{x,y) - 'Ko,M+ih2{x,y) + 'KN+i,M+ih2{x,y)} 

N,M—^oo 

= h2{x,y), 

proving (4). The bound \P^G2\y^^ < c\P^h2\y_^ is obtained by showing in a similar way 
that 

AT M 

N,M^oo ■ 



P^G2{X, y) = lim^ Yl E ^n,m{P'h2Kx, y) 

= J J R2{x,y-dzi,dz2){P^h2}{zi,Z2). 

□ 

5.2. Proof Lemma 2.2. Prom the definition, we have hn{x, y) = On + hn,i{x) + hn,i{y) + 
hn,2{x,y), and we deduce after some rearrangements that 

n n ^ 

UniK) = Un,0 + J^U>„,i(^)^„,i(X£) + ^ ^ w;„(^,j)/^„,2(^^Xj)• 
^=l e=i 3=1 

Using (4), we write 

hn,2{X(,, Xj) = An,2{Xj,Xg, Xj, Xi) 

= Qn,e,j + ^n,2{Xj, Xi, Xj,X£) — An^2{Xj-l, Xi^i, Xj,X() 
= Qn/,j + {PGn,2{Xf^^i, Xj) — PGn,2{Xe, Xj)) 
+ {PGn,2{^j-l,Xe) - PGn,2{Xj,Xe)) + {P^Gn,2{Xj,Xe) - P^Gn,2{Xj-i,Xe_i)) . 

Rearranging the terms, it is easy to verify that 

n e n e 

E E j)^n,2{Xe, Xj) = X] E ■?')'5n,€J 

e=i j=i e=i j=i 

+ E E ^nK^J) {PGn,2{Xe-l,Xj) - P'Gn,2{Xj-l,Xe_i)) 

e=i j=i 

n e 

+ EE^n^(^'-?') {PGn,2{Xj-l,Xe) - P'Gn,2{Xj_i,X,_^)) 

e=i j=i 

n e 

+ Y.Y.'^n\iJ)P^Gn,2{Xj_l,Xi_l)+en, 

t=l j=l 
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where e„ is comprised of the remainder telescoping sums. We obtain 



e=i 

n 

+ J2Mn,j) {P^Gn,2{Xn,Xj) - PGn,2{Xn,Xj)) 

n 

+ Y,{M^-h^)PGn,2(.Xe-i,Xe)-Wn{£J)PGn,2(.Xi,Xe)) . 

1=1 

□ 

5.3. Proof theorem 3.1. A2 impHes that we can find p> 1 such that 

SM^¥.(v^P{Xn)) <oo. (26) 

n>0 ^ ^ 

From (10), Tl^^{h) = ^^L^ T.]=iWn,iM " 3)KXj)h{Xf) + where WnM = n"! and 
Wn,b{^) = 2n-^ivi,{kc-^) (in particular Wn,b{i) = for £ < 0). Set A^^j^{£) = Wn,b{^) - 

(2) 

Wn,b{f^ — 1) and ^(£) = 2wnfi{i) — Wnfi{£ + 1) — Wn,b{^ ~ !)• Then Lemma 2.2 applied to 
ELi Ej=i Wn,b{(- - j)h{Xj)h{Xi) gives: 

n n i—1 

^l,bW = ^Ql + J2Y1 "^n^bii - j)QeQj + + Cn, 
e=i e=i j=i 

where 

n I n i 

Cn = J2 pG{x,.i) aS(^ - j)Qj + E E - j + i)^^(^i-i) 

i=i 3=1 e=i 3=1 

n e-2 

+ E ^G(X,_i) E Ai^i(£ - i)PG(X,_i) 

^=3 j=l 



+ PG(Xo) <! E y^n^me + E aS(^)^G'(X,_i 

^=1 



- PG(X„) i E (" - - E - j)PG{Xj) 

n n 

+ aS(0) E (^G(^^-i))' - K,6(0) - a2(1)) E PG{Xe)Qe 

1=1 e=i 

- A^^)(l) iPG{Xn)f - wn,b{n - l)PGiXn)PG{Xo). 

Using A2, (26), the martingale-difference property of {Q^, I > 1}, the smoothness of 
lUfe, we derive that for p > 1 as in (26), 

E^/^(|Cn^)<cc;'^^''^ n>3, (27) 

for some finite constant c. 
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By martingale approximation for linear partial sums (see e.g. the proof of Proposition 
6.1 below), for any sequence of real numbers {a„^£, 1 < i < n}, 



where E 



\aA2 



and E(|e„,i|") < c |a„,i| + |a„,n| + ^\an/ - an,e-i\ 



(28) 



^=2 



provided sup„>o E (V2"(X„)) < oo. We use (28) to bound the term Rn as given in (11) 
and obtain for all n > 1: 

E{\Rnn<cn-P(F^, 

for some finite constant c. 

By standard martingale inequalities, we obtain the bound 

-I e-i 



E 



^^wn,b{i - j)QeQj 
1=1 j=i 



2 _E_|_lv2 



□ 



5.4. Proof theorem 3.2. If c„ = o(n) and p > 2, then from Theorem 3.1, r^6(^) = 

5^"=i Q£+op(l). Given the ergodicity assumption C(l, Vg^, V3) and supj.>o 'E,{y^{X};)) < 
00, it follows from Proposition 6.1 that the term J2e=i Q'e converges in probability to 
the limit 

J fi{dx) J Pix,dy) {G{y)-PG{x))\ 

which is easily seen to be equal to a^{h). This proves the first part of the theorem. 

Prom now on, we assume that c„ = n. Define W„^^ = Then by (28) with 

'^".^ = ^lih) ' 

_ ., . = } Wni + en 2, where e„ 2 converges in probability to zero. 

Define \_x\ as the largest integer smaller or equal to x and for < t < 1, we introduce 

L"tJ .t 

-Sn(i) = X]^"'^' ^"^^ -^nW = / Wb{t- u)dBn{u). 

e=i 

Since Bn has jumps only at times £/n = i/cn, wc sec that Zn{Cc~^) = Ylj=o''^b{{£ — 
j)c~^)Wn,j+i- It is also easy to see that the term in (11) can be written as 



Rn = 2Bl{l) [ {l-u)wh{u)du 
Jo 



2B„(1) 



( / Wb{u)du+ / Wb{u)du ] dBn{t) + en,3, 
\Jo Jo 
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where e„,3 converges in probability to zero. Thus 



e=i e=i ^ ^ " 0= 

n n 



n 

= a\h) + 2a\h) Wn,lZn ((^ - Ijc"^) + i?„ + Cn 

1=1 f=l 

= a2(/i) Vw^£ + 2a2(/i) / Z„(t)d5„(t) + 252(l)(72(/i) / Wbiu){l - u)du 
' Jo Jo 

-2Bn{l)a\h) f gb{u)dBn{t) + en,4, 
Jo 

where gb{u) = /g Wb{u)du + * Wb{u)du, and e„^4 converges in probabiUty to zero. 

From the assumptions, sup^>Q E (IQ^p^'^) < oo, for some e > 0. Therefore, by the 
functional central limit theorem for martingales, Bn B, where B = {B{t), < t < 1} is 
the standard Brownian motion. By the continuous mapping theorem, {Bn, Z^) ^ (-B, Z), 
where Z{t) = w{t — u)dB{u). And by the weak convergence of stochastic integrals (see, 
e.g., Theorem 2.2 in Kurtz and Protter (1991)), 

I [D.n{t), 1^ Zn{t)dBn{t), 1^ 9{u)dBn{u), ^It^ ,0<t<l^ 

converges weakly to the stochastic process 

I [Bit), j Z{u)dB{u),J g{u)dB{u), ij , < i < l| . 

As the remainders (en,2, en,4) converges in probability to 0, this entails that (^Yll=i ' fcl^) 

converges weakly to the limit 

a'^{h) [1 + 2 Z{u)dB(u) + 2B^{l) {I - u)w{u)du - 2B{1) I g{u)dB{t) 
\ Jo Jo Jo 



The conclusion of the theorem follows by the continuous mapping theorem. 
5.5. Proof Theorem 4.1. By (6), we have: 

(21) gives 

(iCnP) < cn-V-j \\K,2ily^ = o(l) 
by (25). This shows that cr~^^„ converges in probability to zero. 



□ 
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Now, by the martingale property, we have 



E 



n e-1 

,i=2j=l 



e=2 



e—l \ n t—1 

n e-2 e-1 
+ E ^{QnAiQnAk) = o{al) 

e=2 k=l j=k+l 



by (23), Lemma 5.1 and (25). 



Lemma 5.1. Under the assumptions of Theorem 4-1, 

-2 £-1 



IE [ X X Qn,e,jQn,e,k 

k=l j=k+l 



< cn III /in,2 III ' 3 < ^ < n. 



Proof. Fix 1 < k < i and define 



Tk = Tn,e,k := E I X Qn,e,jQn,e,k\^k-l ) , 



so that 



e-2 i-i 
IE I X X Qn,e,jQn,e, 

k=l j=k+l 



e,k 



k=l 



For m > 0, define 

T2,miXj-l,Xk-l,Xk) = J P{Xj-i,dXj) J P"'{Xj,dxe-i) 

X y P(a;£_i,dx^)A„,2(a;j-i,a;^-i;a;j,x^)A„,2(a;fc-i, 

Ti^rnixk-l,Xk) ■■= J {P"'{xk,dXj-l) - IJ,{dXj-l)} T2,e-m-k 

Then almost surely we have: 

(e-k-i \ 
X '^iji-^k, ^k-l)\J^k-l j ■ 

The bound (22) and the Cauchy-Schwartz inequality imply that 



xe-i;xk,xi) 



{Xj — l, Xj~, Xk—\). 



P{xi^i, dxe)An,2ixj-i,Xi_i;xj,xe)An,2{xk-i,xe-i;xk, x. 

- ^ III V2|||2,F2 {^ixi-x,Xj) + W{xi-i,Xj-i)) {W{xi-x,Xk) + VF(x£_i,Xfe_i)) 
We combine this with B2 to conclude that for all m > 0, 

\T2,m{xj-l,Xk-\,Xk)\ < C \\\hn,2\\\ly^ Ul{xj-l)U2{xk, Xk-l) ■ 
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By the short-range dependence assumption C(l, Wi, Vi), it follows that for any n > 0, 



e-k-i 



j=0 



e-k-i 



< XI J {P^{xk,dxj^i) - iJ,{dxj-i)}T2/-j-k-2{xj-i,Xk-i,Xk) 



j=0 

for some finite constant c. We conclude that 

|2 



< C \\\hn,2\\\ly^VliXk)U2iXk,Xk-l), 



\Tk\ < C l\hn,2\lly^'^{Vl{Xk)U2{Xk,Xk-l)\Tk-l) . 



The lemma follows. 



□ 



6. Appendix A: A weak law of large numbers for Markov chains 



Proposition 6.1. Let {Xn, n> 0} be a Markov chain with invariant distribution ji and 
transition kernel P. Suppose that there exist measurable functions V\ <V2 X [IjOo) 
such that 

WP^x, •) - nWv, < cV2ix), xeX, (29) 



fe>0 



for some finite constant c. Suppose also that Vn := E(Vj'(X„)) < oo for each n > and 

for some p € (1,2]. Let {/„, n > 1} be such that fn,Pfn £ ^Vi o,nd let {a„^fe,0 < k < n} 
be a sequence of real numbers such that 

/ n \-Pn 



vfe=l 



k=l 



and \Pfn\vi E |a„,fc| ^ \an,k - an,k-i\vk-i 0. 



\k=l 



k=l 



Then, as n ^ oo, {Ylk=i Wn,k\) J2k=i^n,k {fn{Xk) - f^ifn)) converges in probability to 



zero. 



Proof Define Sn = X)fe=l «n,fe {fn{Xk) - fJ-ifn)) and gnix) = Y,j>Q{P^ fn{x)- n{fn))- Un- 
der (29), \gn{x)\ < c\fn\viy2{x) and \Pgn{x)\ < c\Pfn\viV2{x). By the Poisson equation, 
fnix) - ix{fn) = gn{x) - Pgn{x) which implies that 



= E an,k {gn{Xk) - Pgn{Xk-l)) + ^(an.ik - an,k-l)P9n{Xk-l) 
k=l k=l 

+ (anflPgniXo) P9n{Xn)) . 

where the martingale array X]fc=i '^n,k (dniXk) — Pgn{Xk-i)) satisfies 

/ n \ n 

E I E an,k (gniXk) - Pgn{Xk-l)) T < C ^ WnM''^- 



k=l 



k=l 
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The last inequality follows by noting that gn{x)-Pgn{y) = fnix)-l^{fn)-P9nix)-Pgn{y) 
and by conditioning on Fk-i- Thus, under the stated assumptions, |o^n,fc|) Sn 

converges in probability to zero. □ 

Remark 7. An important special case is the case where an/ = 1 and sup„>oIE {y^{Xn)) < 
oo. In this case it is enough to have n~'^~^^^'^ Vi ~^ addition it is true that 

^^PxgA" PVi {x) /Vf {x) < oo, then clearly |||/n|||p < c|/„|vi and the law of large number 
holds if n-i+Vp|/„|yj ^ 0. 
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